HU-EP-12/16 



Comparing efficient computation methods for 
massless QCD tree amplitudes: 
Closed Analytic Formulae versus Berends-Giele Recursion 

Simon Badger 1 , Benedikt Biedermann 2 , Lucas Hackl 2 3 , 
Jan Plefka 2 , Theodor Schuster 2 and Peter Uwer 2 

1 The Niels Bohr International Academy and Discovery Center 
The Niels Bohr Institute 
Blegdamsvej 17, DK-2100 Copenhagen, Denmark 

2 Institutfur Physik, Humboldt-Universitdt zu Berlin, 
Newtonstrafie 15, D-12489 Berlin, Germany 

3 Perimeter Institute for Theoretical Physics, 
Waterloo, Ontario N2L 2Y5, Canada 



Abstract 

Recent advances in our understanding of tree-level QCD amplitudes in the massless 
limit exploiting an effective (maximal) supersymmetry have led to the complete ana- 
lytic construction of tree-amplitudes with up to four external quark-anti-quark pairs. In 
this work we compare the numerical efficiency of evaluating these closed analytic for- 
mulae to a numerically efficient implementation of the Berends-Giele recursion. We 
compare calculation times for tree-amplitudes with parton numbers ranging from 4 to 
25 with no, one, two and three external quark lines. We find that the exact results are 
generally faster in the case of MHV and NMHV amplitudes. Starting with the NN- 
MHV amplitudes the Berends-Giele recursion becomes more efficient. In addition to 
the runtime we also compared the numerical accuracy. The analytic formulae are on 
average more accurate than the off-shell recursion relations though both are well suited 
for complicated phenomenological applications. In both cases we observe a reduction 
in the average accuracy when phase space configurations close to singular regions are 
evaluated. We believe that the above findings provide valuable information to select the 
right method for phenomenological applications. 



Contents 



1 Introduction 



2 



2 Description of used methods 

2.1 Closed analytic formulae 

2.2 Berends-Giele recursion 



4 

4 
7 



3 Performance and Numerical Accuracy 

3.1 Evaluation Time 

3.2 Numerical Accuracy 



9 

m 



4 Conclusions 



18 



1 Introduction 



Numerically fast and accurate computation methods for multi-parton tree-level amplitudes in 
QCD are of great importance from many points of view. They crucially enter the theoretical 
prediction for cross sections of multi-jet processes at leading order (LO) in the QCD coupling 
a s as they occur at present particle colliders such as the Large Hadron Collider (LHC). Here 
a variety of computer programs based on the numerical evaluation of Feynman diagrams have 
been developed in the past see for example [1|. With the LHC data from the year 2011 jet 
multiplicities of up to 9 jets in the final state are probed. With the data of the year 2012, LHC 
will be able to investigate jet multiplicities of up to 12 jets. However, for high multiplicities, 
the conventional Feynman diagram based approach quickly reaches its limit, for example 8 
jets in the final state would require already the evaluation of more than 10 7 Feynman dia- 
grams! Hence more efficient methods are needed. Here important progress has been made in 
recent years based on recursive on-shell methods. Moreover, QCD tree-amplitudes are cru- 
cially needed for the computation of one-loop corrections, when these are constructed using a 
numerical implementation of generalized unitarity (for recent reviews see refs. [2]). Recently 
rapid progress has been made in developing and automating the generalized unitarity and in- 
tegrand reduction approaches to computation of loop amplitudes [3,4|. These techniques have 
made NLO predicitons for multi-jet final states at hadron colliders feasible for up to 2 — > 5 
processes (for recent results see for example [5]). On the formal side a number of new methods 
have been devised for the computation of scattering amplitudes in SU (N c ) gauge theories with 
particular emphasis on the maximally supersymmetric (fA£ = 4) Yang-Mills theory. Among 
a host of other developments the Britto, Cachazo, Feng and Witten (BCFW) recursion rela- 
tion [6] was developed which uses only on-shell lower-point amplitudes evaluated at complex 
momenta to construct the desired higher-point trees in any gauge theory. The BCFW recursion 
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was then recast as a super-recursion for super-amplitudes in the maximally supersymmetric 
case of (fA£ = 4) Yang-Mills theory [7]. This super-recursion could then be solved for arbi- 
trary external states by Drummond and Henn in [8] leading to closed analytic formulae for 
all super-amplitudes at tree-level. The projection of the super-amplitudes on the component 
field level yielding fA£ = 4 super Yang-Mills theory amplitudes with external gluons, gluinos 
and scalars may be obtained upon suitable Grassmann integrations. This was done recently 
by Dixon, Henn and two of the present authors [9| who went on to show, that all primitive 
tree-amplitudes in massless QCD with up to four external quark lines of arbitrary flavors and 
an arbitrary number of gluons may be obtained from associated gluon-gluino trees in fA£ = 4 
Yang-Mills theory. This result was then exploited to write down explicit analytic formulae for 
all tree-level amplitudes in massless QCD with up to four quark lines. Moreover, a publicly 
available Mathematica package GGT was provided which generates all analytic tree-level 
gluon-gluino amplitudes relevant for QCD. 1 In its current version GGT directly provides all 
QCD tree amplitudes with up to six quarks. The obtained analytic formulae are very compact 
at the maximally-helicity-violating (MHV) and next-to-maximally helicity violating (NMHV) 
levels but do grow considerably in complexity with growing k for N^MHV amplitudes. 

Hence the "fA£ = 4 SUSY method" [9| for the evaluation of massless QCD trees based on 
exact formulae displays a complementary situation to the conventional Berends-Giele recur- 
sive approach for the efficient numerical evaluation of trees in that its evaluation time scales 
mildly with the parton number n but strongly depends on the number of helicity flips k of 
the amplitude considered. In contrast the Berends-Giele recursion evaluation time is indepen- 
dent of k but strongly depends on the number of partons n. The purpose of this work is a 
detailed analysis of the computation times for these two approaches as well as a test of their 
numerical accuracy. The outcome of our analysis may serve as a guideline on which imple- 
mentation should be used in order to maximally speed up the numerical implementation of 
massless QCD trees in future numerical calculations. The ability to calculate tree amplitudes 
numerically in a fast and accurate manner even for high multiplicity opens up a variety of new 
applications beyond the current uses in fixed order calculations. Given the recent progress in 
the refinement of matching and parton-shower algorithms together with extended reach of the 
LHC searches it is very likely that amplitudes involving ten or even more external partons will 
enter phenomenological studies in the future. 

A similar analysis as presented in this paper has been performed in Refs. [11]. In difference 
to Refs. [11] we focus on the comparison of a purely numerical approach with the usage of 
analytic formulae, since this has not been studied in detail before. Furthermore we compare 
the numerical accuracy of the numerical approach with the numerical evaluation of analytic 
formulae. We also note that very recently the usage of Graphics Processing Units (GPUs) for 
the evaluation of tree-amplitudes has been investigated. More details on this interesting option 
can be found for example in Refs. [12]. 



A Mathematica package for all tree-level amplitudes in N=4 SYM appeared in [ 10 1. 
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2 Description of used methods 

Tree-level gluon amplitudes in non-abelian gauge theories may be conveniently separated into 
a sum of terms, each composed of a simple prefactor containing the color indices, multiplied 
by a kinematical factor known as a partial or color-ordered amplitude. For an n-gluon ampli- 
tude one has 

iC e (l,2,3,...,n) =g n ~ 2 £ Tr(r fl <i)...r^W)A n (a(l)^( 1 )...a(n)Vn)) ) (l) 

oeS„/z„ 

with the argument i ' of the partial amplitude A n denoting an outgoing gluon of light-like 
momentum pi and helicity hi = ±1, i G [1,«]. The su(N c ) generator matrices T a> are in the 
fundamental representation, and are normalized so that Tx{T a T b ) — 8 . Finally, g is the gauge 
coupling. 

Similarly, the color decomposition of an amplitude with a single quark-anti-quark pair and 
(n — 2) gluons is 

jC e (l^,2„3,...,n) = 8 n - 2 E (rV3)...r^)).jAr(1^2 9 ,a(3),...,a(n)). (2) 

Amplitudes with two and more quark-anti-quark lines may be obtained similarly, as explained 
in ref. [13|. Though the color structure becomes more intricate in these cases, all of the 
required kinematical terms are constructed from suitable linear combinations of the color- 
ordered amplitudes for 2k external gluinos and (n — 2k) external gluons. 

Color-ordered amplitudes of massless particles are most compactly expressed in the spinor- 
helicity formalism. Here all light-like four-momenta are written as the product of two compo- 
nent Weyl spinors 

^ a& = afp u = 'k a X a , (3) 
where we take = (1,6) with 6 being the 2x2 Pauli spin matrices. The spinor indices are 
raised and lowered with the Levi-Civita tensor, i.e. X a = £ a p^ and = E^p^- An explicit 
representation is 

Wl-^--p^ipr {jfl-p*) ' ~l?+ipr{ p i + ip 2)- (4) 

In our convention all parton momenta are outgoing. The amplitudes depend on contracted 
helicity spinors 

(ij) := (hh) ■■= M ■■= Wj] ■■= e^5if 31}, (5) 

which are Lorentz invariants. 

2.1 Closed analytic formulae 

Compact analytic formulae for tree-amplitudes in the spinor helicity formalism can be classi- 
fied by the amount of helicity violation. The simplest class is the maximally-helicity-violating 
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(MHV) one. Here either two negative helicity gluons, one negative helicity gluon and one 
quark-anti-quark pair or two quark-anti-quark pairs sit at arbitrary positions within positive 
helicity gluon states of the color-ordered amplitude. These read for the zero and one fermion 
line case [14, 13 ] 



a b) 
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(6) 



3)...<nl) 

^{a- M =^{ P )^^ y (7) 

^T\a-^c q ) = -^\p)^^ y (8) 

where a,b,c G [l,n] and a~ denotes a negative-helicity gluon at position a, while fermions 
of opposite helicity with flavors A, B at positions b, c are denoted by b^ (+|) and c| (— j), 
and p = Y!i=\ Pi- Flavor becomes important for the two fermion line case where one has three 
distinct representatives depending on helicity and flavor distributions [9 | 



(9) 



where the flavor index has been omitted in the single flavor cases. 

The complexity of the closed formulae grows at the NMHV level comprising color-ordered 
amplitudes with k negative helicity gluons and 3 — k quark-anti-quark pairs embedded in a sea 
of [n + k — 6) positive helicity gluon states. In order to express the formulae in a compact way 
one needs to introduce the region momenta xf- 1 via 



x 'j 



aa - uw,., Pj ,r i w- '•<./• (i2) 

k=i 



xu = 0, and xu = —Xji for i > j. All N^MHV tree-level amplitudes can be expressed in terms 
of the quantities (na\a2 ...a^\d) defined by 

(naia 2 ...a k \a) := {n\x mi x ax a 2 ■ ■ •*a*_ 1 a Jt |a) , (13) 

and the spinor products (i j). The pure gluon NMHV amplitude with negative helicity gluons 
sitting at positions a, b and n takes the form [9] 

A NMHV fa- b~ n~) ~ §(4)(P) x 
An {a ] ~ (12)...(nl) X 
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52 R n -st((na)(nts\b)*j + £ R n;st (jb n) (n a)x 



a<s<b<t<n-\ 



a<s<t<b 
4 



+ 52 Rn-stUb a)(nts\n)\ + £ R n - st Unb)(nst\a) 

2<s<a<b<t<n-\ 2<s<a<t<b 

where we have introduced the ^-invariant 

1 (s(s-l)) (t{t-l)) 



(14) 



R 



n;st • 2 



jc^ (nts\s) (nts\s — 1) (nst\t) (nst\t— 1) 



(15) 



with ^ n;S ( := for t = s + l or s = t + 1. Let us also state NMHV amplitude with one quark- 
anti-quark pair and two negative-helicity gluons. Here there are two distinct configurations 
(we take a <b < c below) [9] 



AT HY (a q ,b-,c q ,n-) 



(12). ..(«!) 



(ab)(bc) 3 £ (/irs|n) 4 /?™, 

i<s<a,b,c<t<n 

{be) 3 {an) 52 {nts\b){nts\n) 3 R lh st 

a<s<b,c<t<n 

{cn) 3 {an) £ {nst\b) 3 {nts\b)R n , st 

a<s<b<t<c 



{cn) 3 {ab) 52 {nst\b) 3 {nts\n)R n , 



St 



(16) 



A? MHV K,^,c-,n 



8< 4 >(p) 



(12). ..{«!) 



+ {ac){bc) 3 52 

!<5<a,£>,c<f<n 



+ {an){bn) 3 52 («^|c) 4 ^„ 

/?<j<c<f<n 



b<s<t<c 

+ {cn) 4 52 {nst\b) 3 {nst\a)R n ,st 

\<s<a,b<t<c 

+ (b c) 3 {a n) 52 (nts\c) {nts\n) 3 R n _, 

a<s<b,c<t<n 



+ {cn) 4 {an) 52 x 2 st {nst\b) 3 R nv 

a<s<b<t<c 



(17) 
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The two and three fermion line amplitudes are more involved and may be found in the ap- 
pendix B of [9]. These formulae have also been implemented in the publicly available Math- 
ematica package GGT 2 , As already stated in the introduction, the current version of GGT also 
provides all QCD amplitudes with up to six quarks. 

An estimate for the evaluation time of these closed formulae for the amplitudes is the num- 
ber of terms the expression has. From this one estimates the evaluation time of the MHV 
amplitudes to be of order n since the number of spinor products to be evaluated is approxi- 
mately n. The number of terms in the formulae for the NMHV amplitudes grows as n 2 for 
large parton numbers n. Excluding the MHV prefactor the complexity of each of the terms 
is independent of the parton number, hence the asymptotic scaling in evaluation time is n 2 . 
This is competitive with the Berends-Giele recursion method which grows independent on 
the helicity distributions of the partons as n 4 , discussed in the next subsection. The NNMHV 
formulae of [9| display a growth in the number of terms as n 4 . Due to the same arguments 
as in the NMHV case we thus expect a similar performance of the NNMHV formulae as the 
Berends-Giele recursion and a detailed comparison is needed to see which method wins. Go- 
ing beyond the NNMHV level with the analytic formulae of [9| for the amplitudes appears to 
be disfavored as in general the number of terms in an N^MHV formula grows as n for large 
parton numbers. 

The closed analytic formulae of [9| for the MHV, NMHV and NNMHV with zero to three 
quark-anti-quark lines have been directly implemented in a C++ program cGGT.cpp which 
can be provided upon request. cGGT . cpp contains the straightforwardly hard-coded analytic 
formulae and a natural amount of caching is performed in order to speed up the numerical 
evaluation of the amplitudes for a given phase-space point. As such all the region momenta 
are evaluated and stored during initialization, similarly all spinor brackets are evaluated with 
the reduced spinors in Eq. (4) without the square root dependent pre-factor, which is only 
evaluated at the very end, as typically even powers of the pre-factor arise. 

2.2 Berends-Giele recursion 

In this subsection we briefly comment on a purely numerical implementation of leading-order 
scattering amplitudes in massless QCD. Since an extensive literature exists on the subject 
we limit ourselves to the basic ingredients. More details can be found in Ref. [15]. In differ- 
ence to on-shell recurrence relations developed more recently [6 ], the Berends-Giele recursion 
uses off-shell currents as basic building blocks. In pure gauge theory the off-shell currents 
Jn(\ hl ,2 hl . . .,n h ") correspond to the amplitudes for the production of n gluons with helicities 
hi and one off-shell gluon with the corresponding polarization vector stripped off. The on- 
shell scattering amplitude is thus obtained by taking the on-shell limit and contracting with 
the polarization vector of the additional gluon. As mentioned in the previous section it is con- 
venient to split the scattering amplitude into a color part and the remaining Lorentz structure. 
In practice this can be done for example by using color-ordered Feynman rules (for details 
we refer to Ref. [16]). The full amplitude will in general contain different color structures. 
However since not all of these structures are independent it is usually sufficient to calculate 



2 Included in this submission and at http : //qft .physik . hu-berlin . de 
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only a few of them and reconstruct the remaining ones by permuting the external gluons. The 
key observation leading to the Berends-Giele recurrence relation is the fact that any off-shell 
current can be written as a sum of simpler off- shell currents connected via the appropriate 
three- (V^) and four-gluon (V4 g ) vertices: 



J^l, ...,n) = ^ 



P L 

i,n 



n-l 

+ L L< PO 7v(l,...,^p0'+l,---,j)^a(j + l,---,n) 

j=i+l i=l 



(18) 



where we have suppressed the helicity index and the gauge coupling is set to one. In addition 
the definition 

i 

p i,j = J LPk for J>i ( 19 ) 

k=i 

is used, we note Pij = from (12). The color-ordered vertices are given by 
V£ P (P h P 2 ) = {g vp (Pi -P 2 r + 2g^-2 g ^P p ) , 

< P ° = ^/V°-/ V g P<T ~A Vp ). (20) 

Since the right hand side of Eq. (18) is formally simpler — only off-shell currents with a lower 
number of gluons are involved — Eq. (18) can be used to calculate off-shell currents recur- 
sively. The end-point of the recursion is given by 

Mi h ')=U hi \ Pl )Y (2D 



(h) 

where E^, {pi) denotes the polarization vector of a gluon with momentum pi and polarization 
h{. We take all the partons as outgoing, that is the on-shell limit of the scattering ampli- 
tudes correspond to the transition — > g(l hl ) • • • g(n h ")g((n + 1)' ! " +1 ). Scattering amplitudes 
for physical processes are obtained as usual by crossing. Using the explicit form of the three- 
and four-point vertices as given above, the implementation of Eq. (18 ) in a computer program 
is straight forward. As can be seen from Eq. (18) the same sub-current may appear at dif- 
ferent depths of the recursion. To speed up the numerical evaluation it is thus important to 
cache the sub-currents and evaluate them only once. We note that the possibility to reuse sub- 
currents during the calculation is a major advantage of off-shell recurrence relations compared 
to on-shell methods. Since recursive implementations tend to be sub-optimal to get high com- 
puting performance Eq. (18) is implemented as a bottom-up approach. The program uses the 
one-point currents specified by the user in terms of particular polarization states together with 
the respective momenta to calculate the two-gluon off-shell currents JJi (i+ The 
two-point currents together with the one-point currents are then used in the subsequent step 
to calculate the three-point currents. This procedure is repeated until the current of maximal 
length is obtained. Owing to our restriction to specific color structures only a fixed cyclic or- 
dering needs to be considered. One can show that if sub-currents are cached the computational 
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effort for the evaluation of an n-point current scales as n , (Without cache the scaling would 
be 4".) We will come back to this point when we discuss the numerical performance. As a 
technical detail we remark that in the implementation presented here [3] no specific bases for 
the polarization vectors has been used. In particular no helicity methods have been applied. 
Since in (almost) all phenomenological applications the gluon polarization is not observed, 
only matrix elements squared summed over all polarization states will occur. As a conse- 
quence an arbitrary bases can be used as long as the sum over all possible polarization states 
is complete. Using real polarization vectors could thus yield a significant speed up since the 
entire calculation can be done using real numbers instead of complex arithmetic. 

The extension to include also quarks — massive as well as massless ones — is straight for- 
ward. The main difference, namely that some sub-currents do not exist since there is no direct 
coupling between quarks, is merely a matter of bookkeeping. We stress that the quark cur- 
rents calculated in the way described above in general do not correspond to partial amplitudes. 
However partial amplitudes can be constructed from the aforementioned currents. The re- 
construction of the full matrix elements — not subject of this article — has been checked for a 
variety of different processes [17]. 

3 Performance and Numerical Accuracy 

The scattering amplitudes described in the previous section find their application in leading- 
order phenomenology at hadron colliders. However, this is not the only application. With 
the development of unitarity inspired techniques, leading-order amplitudes represent an im- 
portant input to the evaluation of one-loop amplitudes. In both cases the amplitudes need to 
be evaluated for millions of phase space points. The required computation time is thus an 
important factor in choosing the optimal approach. We compare the evaluation time in detail 
in Section 3.1 , 

In particular when using leading-order amplitudes in the evaluation of one-loop amplitudes, 
not only the speed but also the numerical accuracy matters. In the unitarity method the one- 
loop amplitude is reconstructed from a large number of different cuts requiring the evaluation 
of the corresponding tree amplitudes. It is thus important to assure a good accuracy of the 
individual contributions. Even in the case that analytic formulae are available one should keep 
in mind that when it comes to the numerical evaluation usually only a finite floating point 
precision is employed — unless special libraries to allow for extended precision are used. As 
a consequence, numerical cancellations between individual contributions may result in a loss 
of accuracy of the final result. Since a detailed understanding of the numerical uncertainties is 
also important when results from different methods are compared we investigate the numerical 
uncertainties of the two approaches discussed in the previous Section in Section 3.2. 



3.1 Evaluation Time 

Before discussing the results in detail we briefly describe how the runtime is analyzed. To 
investigate the performance we used a computer with 16 GByte main memory and Intel(R) 
Core(TM) i5 3.33GHz cpu running under Debian 6.04. To reduce context switches as much 
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Figure 1: Average time required per phase space point for the evaluation of pure gluon 
amplitudes as function of the partem multiplicity. 



as possible we payed attention to the fact that the computer was used exclusively for the 
performance measurements. Furthermore we used the POSIX function getrusage for the 
measurement of the used cpu time, which is to some great extent context independent. The 
function returns the time spent in user mode split into seconds and micro seconds. It is not 
documented whether the underlying clock provides a real time accuracy at the level of micro 
seconds. One can assume however that a precision at the level of milli seconds should be 
feasible which is sufficient for our purpose using the procedure described in the following. 

The key observation is that both the evaluation time of the analytical formulae and of the 
Berends-Giele recusion depend on the positions of the fermions. In the case of the analytical 
formulae we additionally have a dependence on the position of the negative helicity gluons. 
Hence, we chose to average over all configurations to which the analytical formulae directly 
apply without exploiting the cyclic symmetry of the amplitudes, e.g. all configurations with a 
negative helicity gluon at position n for amplitudes with at least one gluon of negative helic- 
ity. To obtain reproduceable results and to reduce the computational effort to a minimum we 
took the following approach: Per measurement a minimum cpu time of at least one second is 
required to obtain reliable results. Using empirical knowledge together with the known scal- 
ing of the runtime as a function of the multiplicity we estimated the number of phase space 
evaluations for each sub-process/multiplicity. We then generated one phase space point and 
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evaluated all matrix elements corresponding to our desired average the required number of 
times. While in the determination of the accuracy it is important that on-shell condition and 
four momentum conservation are respected as precise as possible, the runtime measurement 
is insensitive to the "quality" of the phase space point — as long as no floating exceptions are 
encountered. (Floating point exceptions would lead to exception handling and the execution 
of different code.) In Fig. 1 the cpu time per phase space point for pure gluon amplitudes 
is shown. We compare analytic formulae for three different helicity configurations (MHV, 
NMHV, NNMHV) with the purely numerical approach using the Berends-Giele recursion. 
Since in the implementation of the Berends-Giele recursion no helicity methods are used the 
runtime is the same for different helicity configurations. Fitting the last five data points with 
fin) = An B , where n is the gluon multiplicity, we obtain B « 4.12 which is allready quite 
close to the predicted asymptotic 0(n 4 ) behavior. We stress that this is a property of the al- 
gorithm and cannot be changed by a different implementation. The implementation can only 
affect the normalization factor in front of the n 4 behavior. Let us now compare with the run- 
time required for the evaluation of the analytic formulae. In case of the MHV amplitude the 
evaluation is more than three orders of magnitude faster for 25 gluons — as one would have 
expected given the compactness of the analytic results. We have checked that the timings 
shown for the MHV amplitudes perfectly agree with the predicted n l scaling. We emphasize 
that no time consuming square roots (contained in the spinor products) have to be evaluated 
in all gluon amplitudes since each spinor appears an even number of times. This is no longer 
true for amplitudes involving fermions as their associated spinors appear an odd number of 
times. Hence, for each fermion one square root is required. The predicted large n behavior 
of n 2 for the NMHV amplitudes is in good agreement with the n 11 fit from the last five data 
points in Fig. 1 . This is still much better than the n 4 of the Berends-Giele approach. As a 
consequence for large multiplicities the analytic results are almost two orders of magnitude 
faster than Berends-Giele. The situation changes when it comes to the NNMHV amplitudes. 
From the number of terms in the analytic expression we expect an asymptotic behavior of 
the form n 4 leading to a similar rise of the runtime as a function of the gluon multiplicity as 
observed in the Berends-Giele case. However, fitting the last five data points reveals that, with 
a scaling of n 4 - 5 the analytic formulae are still farther away from the asymptotic behavior than 
Berends-Giele. Consequently for 15 gluons and more the Berends-Giele recursion starts to 
become more efficient. As mentioned already, the asymptotic behavior is a property of the 
underlying algorithm and cannot be changed by a 'more clever' implementation. 

Let us add at that point a remark concerning the absolute timings: For low multiplicities 
the evaluation time is of the order of micro seconds while for n = 25 order milli seconds are 
required. For practical applications one should keep in mind, that the timings are for spe- 
cific color and spin configurations. While for low multiplicities the number of color and spin 
configurations is still small (i.e. for n < 5 only MHV amplitudes exist) one can expect that 
color and spin summed squared amplitudes can be evaluated in less than one milli second per 
phase space point. However for large multiplicities the number of color and spin configura- 
tions grows rapidly. A naive sum over color and spin would thus give an additional factor 
which would render a brute force evaluation impossible given today's computing resources. 
In such cases refined methods like for example Monte Carlo sums over spins and colors would 
be required. 
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Figure 2: Evaluation time per phase space point for amplitudes with a quark-anti-quark 
pair and N — 2 gluons. 



In Fig. 2, Fig. 3 and Fig. 4 we show the results of a similar analysis, now for amplitudes 
involving up to three quark-anti-quark pairs. Again the Berends-Giele recursion method is 
presented only for a fixed number of negative helicity gluons since our implementation is 
independent of the gluon helicities. However, to take into account that the runtime depends on 
the position of the quarks in the primitive amplitude we took the same configuration average as 
for the corresponding analytic formula of smallest MHV degree. Overall we observe a picture 
similar to the pure gluon case: for MHV and NMHV amplitudes the analytic results are much 
faster than the evaluation based on the Berends-Giele recursion. Comparing the performance 
of the Berends-Giele recursion for 0, 2, 4, 6 quarks we find a decreasing dependence on the 
parton multiplicity. This is simply due to the fact that for a fixed multiplicity the number of 
currents which have to be evaluated decreases if more fermions are involved. Since the n 4 
asymptotic of the recursion is due to the four gluon vertex, we expect that the asymptotic 
scaling will be approached from below. Indeed, for two, four, six quarks we get n 3 96 , n 3 - 83 , 
n 3.64 f rom i ast fj ve d ata points compared to n 3 77 , n 3 43 , n 319 for up to n = 15 partons. The 
timings of the analytical formulae show only a small dependence on the number of quarks. 
As a consequence the Berends-Giele recursion is more efficient for the NNMHV amplitudes 
involving quarks. In case of all MHV amplitudes it is remarkable that the analytic formulae 
for MHV amplitudes show a very weak dependence on the parton multiplicity. The evaluation 
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Figure 3: Evaluation time per phase space point for amplitudes with two quark-anti- 
quark pairs (different flavors) and N — 4 gluons. 

of an MHV amplitude for n = 25 takes only 6 x 10~ 7 s longer than the evaluation of the four 
point amplitude. This is easily understood from the structure of formulae since increasing the 
multiplicity by one results only in the additional evaluation of two reduced spinor products 
and one squared spinor normalization factor. 

3.2 Numerical Accuracy 

Understanding the numerical accuracy is crucial for numerical cross section evaluations. In 
cases where analytic results are available it is possible to assess the accuracy of purely nu- 
merical approaches by comparing with analytic results. However the numerical evaluation 
of analytic formulae may also be affected by numerical instabilities. Furthermore a reliable 
method is also required for situations where no analytic results are available. In [3| the so- 
called scaling test was proposed. When applying the scaling test the scattering amplitudes 
are calculated twice for a given phase space point: for each phase space point the scattering 
amplitudes are calculated for the given momentum configuration. The evaluation is then re- 
peated for a re-scaled set of momenta. Since the corresponding effective operators are not 
renormalized no anomalous dimension appears. The two evaluations are thus related by their 
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(N-6) gluon 6 quark amplitudes 
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Figure 4: Evaluation time per phase space point for amplitudes with three quark-anti- 
quark pairs (different flavors) and N — 6 gluons. 



naive mass dimension: 

A n (pi,p2,...,p n ) =* w_ A„(xp\,xp2, ■■■,xp„). (22) 

As was pointed out in [3 ] using a value for x which is not a power of 2 will lead to a differ- 
ent mantissa in the floating point representation and thus to different numerics. The method 
thus allows to assess the size of rounding errors. To estimate the numerical uncertainties we 
have applied the scaling test for a large number of phase space points. As a measure for the 
uncertainty we have evaluated for each phase space point the quantity 8: 

Ai-A 2 



6 = log 10 2 



Ai+A 2 



(23) 



where A i denotes the result of the amplitude evaluation for unsealed momenta while A 2 is cal- 
culated from Eq. (22). The quantity |8| gives a measure for the valid digits in the evaluation, 
i.e. a value of \d\ = 3 would mean that we expect ~ 3 digits to be correct. As an example we 
show in Fig. 5 results for the 25 gluon amplitude. In case of the Berends-Giele recursion the 

alternating helicity configuration H 1 h ... is evaluated. The remaining three histograms 

show results using analytic formulae for MHV, NMHV, and NNMHV amplitudes. The phase 
space points are generated using a sequential splitting algorithm as described in [ 1 8 1 . This al- 
gorithm does not produce a flat distribution in phase space. In fact collinear configurations are 
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25-gluon amplitudes: 
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Figure 5: Accuracy 8 for 25 gluon amplitude for purely numerical evaluation based 
on the Berends-Giele recursion (BG) and for analytic formulae (GGT) as 
described in Section 2. 1 Phase-space generation by sequential splitting. 



preferred. We note that we always use a couple of default cuts based on the JADE jet algorithm 
to avoid singular regions in the phase space. In particular we require (2pi ■ pj)/s > 10~ 10 . We 
emphasize that as a consequence of the sequential splitting algorithm collinear configurations 
will dominate for multiplicities greater than 15, e.g. for N = 20 almost all phase space points 
have a collinearity of 10 10 . It follows from Fig. 5 that most of the phase space points are 
evaluated with a precision better than 5 valid digits — largely sufficient for any practical ap- 
plication at hadron colliders. Since we are mainly interested in a comparison between the 
purely numerical approach and the usage of analytic formulae for different parton multiplic- 
ities we have calculated an average accuracy for different parton multiplicities and different 
helicity configurations. The result is shown in Fig. 6, First of all we observe that in general 
the analytic formulae perform better as far as the accuracy is concerned. Furthermore it can be 
seen that for the analytic formulae the accuracy degrades when we move to the more complex 
configurations as for example the NNMHV amplitudes. This has two reasons. First of all the 
corresponding formulae are more involved and are thus more difficult to evaluate numerically. 
The second reason is due to numerical cancellations between individual terms which leads to a 
loss of accuracy. This is supported by the observation that the NNMHV split helicity where no 
cancellation occurs is almost as accurate as the MHV formula. In the worst case the accuracy 
is only marginally better than what we observe in the purely numerical case. For the Berends- 
Giele recurrence relations we show only one helicity configuration since no helicity methods 
are used as mentioned in the previous section. Describing the gluon polarization using a four- 
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Numerical precision of N gluon amplitudes 
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Figure 6: Average accuracy |8| for pure gluon amplitudes as a function of the gluon 
multiplicity (GGT analytic formulae, BG Berends-Giele). Phase-space gen- 
eration by sequential splitting. 



vector the naive expectation would be that all helicity configurations should perform similar. 
A more detailed analysis shows a mild dependence on the helicity configuration as can be seen 
in Fig. 7. To assess the dependence of the aforementioned results on the phase space gener- 
ation we show in Fig. 8 the average accuracy for a flat phase space generation obtained by 
using the algorithm RAMBO described in [ 19 1. Comparing Fig. 7 and Fig. 8 we observe two 
important differences: First of all the dependence of the average accuracy on the helicity con- 
figuration is now more pronounced than in the case where collinear phase space configurations 
were preferred. Second we observe that in particular cases the accuracy can compete with the 
numerical evaluation using analytic formulae. Our understanding of the observed pattern is 
the following: Particular sub-amplitudes or even entire amplitudes may vanish due to "helic- 
ity conservation". In the numerical approach this 'zero' is 'calculated' from a combination 
of individual non-zero contributions. Depending on the helicity configuration the cancellation 
may appear earlier or later in the recursion affecting through accumulated rounding errors the 
accuracy of the final result — leading to the observed helicity dependence of the average ac- 
curacy. In case that collinear phase space configurations are preferred a second effect becomes 
important: It is well known that scattering amplitudes in pure gauge theory show only square 
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Figure 7: Average accuracy of the amplitude evaluation using Berends-Giele recursion 
for phase space generation with sequential splitting. 



root singularities (1/ y/ptPj) for collinear phase space configurations (pi\\pj). On the other 
hand individual Feynman diagrams show a more singular behavior. In the numerical approach 
the square root behavior is obtained through a numerical cancellation of the leading behavior. 
It is obvious that this leads to a loss of accuracy. In the extreme case for highly collinear 
configurations the 15 digits precision is no longer sufficient to evaluate a meaningful result. 
Using phase space configurations which tend to be collinear the second effect will dominate 
the average accuracy. As a consequence we observe in Fig. 7 only a mild dependence of the 
average accuracy for different helicity configurations together with an equally 'bad' overall 
accuracy. (One should keep in mind at this point that for all practical applications in collider 
phenomenology 8 significant digits are largely sufficient.) As far as the analytic formulae are 
concerned we observe an opposite effect: In case of a flat phase space generation no particular 
cancellation in the analytic formulae is present. As a consequence we observe a very high av- 
erage accuracy close to the maximum of about 15 digits as one would have expected. However 
studying collinear configurations leads also in the analytic formulae to cancellations between 
individual terms. As a consequence the average accuracy degrades in that case. 

One could argue that a flat phase space generation would be more appropriate to investigate 
the average accuracy. However we believe that for practical applications the average accuracy 
evaluated in that way would be less meaningful. In phenomenological applications the cross 
sections will get important contributions from collinear configurations. Using Monte Carlo 
methods for the cross section evaluation collinear events will thus dominate the total result. In 
an ideal situation this would be taken into account through the phase space integrator by pre- 
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Figure 8: Average accuracy of the amplitude evaluation using Berends-Giele recursion 
for flat phase space generation. 



ferring collinear configurations. This reasoning is also supported by the empirical observation 
that using RAMBO for the cross section evaluation usually leads to a poor performance of the 
Monte Carlo phase space integration in terms of computational effort and achieved integration 
accuracy. 

For completeness we analyzed also the average accuracy for amplitudes involving massless 
quarks. The result is shown in Fig. 9. Again we have used a phase space generation preferring 
collinear events. As far as the numerical approach is concerned the result looks similar to the 
pure gluon case. This is just a consequence of the basic fact that the recurrence relation is 
very similar apart from the spin dependence. Since some vertices do not exist in the quark 
case the mixed amplitudes contain less terms and are slightly more precise. Concerning the 
analytic formulae we observe that the accuracy is not as good as in the pure gluon case. Our 
naive understanding is again that the corresponding formulae are more involved requiring 
more floating point evaluations and leaving more room for (unwanted) cancellations in the 
case of collinear phase space configurations. 



4 Conclusions 

QCD tree amplitudes are of great interest. The detailed analysis of their analytic structure may 
lead to a more profound understanding of SU(N) gauge theories exposing further symmetries 
undiscovered so far. Concerning phenomenological applications tree amplitudes represent an 
important input for cross section evaluations in Born approximation and beyond. In this work 
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Figure 9: Average accuracy for amplitudes involving quarks. Phase-space generation 
by sequential splitting. 



we have analyzed two different approaches to evaluate tree amplitudes. We have compared the 
numerical performance of a purely numerical approach based on the Berends-Giele recursion 
with the numerical evaluation of analytic formulae. In detail we find that MHV and NMHV 
amplitudes are most efficiently calculated using analytic formulae. For NNMHV amplitudes 
and beyond we find the purely numerical approach more efficient. We have also investigated 
the numerical accuracy. In general the numerical accuracy of the analytic formulae (evaluated 
numerically) is superior compared to the purely numerical approach. However we find that 
close to exceptional phase space configuration (soft/collinear configurations) analytic formu- 
lae suffer also from rounding errors. In both approaches we find even for large multiplicities 
an average accuracy of at least 9 digits — sufficient for phenomenological applications. 
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